Prob-Maxn: Playing N-Player Games with Opponent Models
نویسندگان
چکیده
Much of the work on opponent modeling for game tree search has been unsuccessful. In two-player, zero-sum games, the gains from opponent modeling are often outweighed by the cost of modeling. Opponent modeling solutions simply cannot search as deep as the highly optimized minimax search with alpha-beta pruning. Recent work has begun to look at the need for opponent modeling in n-player or generalsum games. We introduce a probabilistic approach to opponent modeling in n-player games called prob-max, which can robustly adapt to unknown opponents. We implement prob-max in the game of Spades, showing that prob-max is highly effective in practice, beating out the max and softmax algorithms when faced with unknown opponents. Introduction and Background Researchers have often observed deficiencies in the minimax algorithm and its approach to game playing. Russell and Norvig (1995), for instance, gave a prominent example of where minimax play can be flawed through slight errors in the value of leaf positions. Others have shown that minimax search can be pathological, returning less accurate results as search depth increases (Beal 1982; Nau 1982). While new algorithms have been designed for better analysis of games (Russell & Wefald 1991; Baum & Smith 1997) or for opponent modeling (Carmel & Markovitch 1996) these approaches have not been widely used in practice. There are a variety of reasons for this, but the primary one seems to be that minimax with alpha-beta pruning is simple to implement and adequate for most analysis. In this paper we turn the research focus from two-player, zero-sum games to n-player, general-sum games. Much less research has gone into this area, but problems in this domain are much more suitable for incorporating additional information such as opponent models. We extend the results in our previous work (Sturtevant & Bowling 2006), which showed that opponent modeling is needed for nplayer games by introducing prob-maxn. Prob-maxn is a search algorithm in the tradition of maxn but makes use of probabilistic models of the opponents in the search. We also show how the probabalistic models can form the basis for Copyright c © 2006, American Association for Artificial Intelligence (www.aaai.org). All rights reserved. learning models during play, through Bayesian inference. In the game of Spades we demonstrate that prob-maxn is superior to existing approaches. Opponent Modeling Algorithms Early work in opponent modeling focused on the problem of recursive modeling (Korf 1989; Iida et al. 1993a; 1993b). While this early work is interesting, it has not made its way into use by current game-playing programs. Carmel and Markovitch (1996), for instance, look at the performance of a checkers program using opponent modeling. But, CHINOOK, which is considered the best program in this domain, does not use explicit opponent modeling. Instead, it relies on other techniques to achieve high performance. Donkers and colleagues (2001) take a more probabilistic approach to opponent modeling which is somewhat similar to the approach we take in this paper. We will address these differences after we have presented our new work. We believe that one reason these approaches haven’t found success in practice is because they have been applied to two-player, zero-sum games. From a practical and theoretical point of view these games are much easier than general-sum games, and thus there is much less of a need to model one’s opponent. We demonstrate a domain where, even given a perfect evaluation function (we search to the end of the game tree), we need to take into account a model of our opponent. Motivating Example: Spades Spades is a card game for two or more players. For this research, we consider the three-player version of the game, where there are no partnerships. The majority of the rules in Spades are not relevant for this work, and there are any number of other games, such as Oh Hell, which have similar properties to Spades. We will only cover the most relevant rules of the game here. Each game of Spades is broken up into a number of hands, which are played as independent units. Hands are further broken up into tricks. Before a hand begins each player must predict, in the form of a bid, how many tricks they expect to take in the following hand. Scores are determined according to whether players make their bids or not. If a player doesn’t take as many tricks as they bid, they get a score of −10×bid. If they take at least as many tricks as they bid they get 10× bid. The caveat is that the number of tricks taken over a player’s bid (overtricks) are also tallied, and when, over the course of a game, a player takes 10 overtricks, they lose 100 points. Thus, the goal of the game is to make your bid without taking too many overtricks. Spades is an imperfect information game because players are not allowed to see their opponents cards. One common approach to playing imperfect-information games is to use Monte-Carlo sampling to generate perfect-information hands which can then be analyzed. While there are some drawbacks to this approach, it has been used successfully in domains like Bridge (Ginsberg 2001). Because this approach works well, we focus our new work on the perfectinformation game and all experiments in this paper are played with open hands. meaning that players can see each other’s cards. Importance of Modeling To help motivate this paper we present some previous results from the game of Spades without explaining the full details of how the experiments were set up and run. These details will be duplicated for our current experiments and are covered in the experimental results section of this paper. The trends shown here motivate the practical need for this line of research. Specifically, we consider two different “player types”, defined by their utility function over game outcomes. The first player type, called mOT, tries to minimize overtricks. The second player type, called MT, tries to simply maximize tricks. When doing game tree search, we must have a model of our opponents. In two-player zerosum games we normally assume that our opponent is identical to ourselves. Recent experiments (Sturtevant & Bowling 2006) have shown that this approach is not robust in n-player games. Consider what happens when these two player types compete, where they both have correct opponent models. That is, the mOT players knows which opponents are maximizing tricks, and the MT players knows which opponents are minimizing overtricks. In this case it is not surprising that an mOT player wins nearly 75% of the games against MT players. What is surprising is that, if each player instead assumes their opponents have the same strategy that they do, an mOT player then only wins 44% of the games. These results are not due to uncertainty in heuristic evaluation: all game trees are searched exhaustively. Instead, there is a fundamental issue of opponent modeling. In 3player Spades we cannot blindly assume that our opponents employ our same utility function, without potentially facing disastrous results. This is in distinct contrast to the very successful use of this principle in two-player, zero-sum games. Multi-Player Game-Tree Search The first game-tree search algorithm proposed for n-player games was maxn. Maxn Maxn (Luckhardt & Irani 1986) is the generalization of minimax to any number of players, while in a two-player, zero1
منابع مشابه
A Comparison of Algorithms for Multi-player Games
The maxn algorithm (Luckhardt and Irani, 1986) for playing multiplayer games is flexible, but there are only limited techniques for pruning maxn game trees. This paper presents other theoretical limitations of the maxn algorithm, namely that tie-breaking strategies are crucial to maxn, and that zerowindow search is not possible in maxn game trees. We also present quantitative results derived fr...
متن کاملMixing Search Strategies for Multi-Player Games
There are two basic approaches to generalize the propagation mechanism of the two-player Minimax search algorithm to multi-player (3 or more) games: the MaxN algorithm and the Paranoid algorithm. The main shortcoming of these approaches is that their strategy is fixed. In this paper we suggest a new approach (called MPMix) that dynamically changes the propagation strategy based on the players’ ...
متن کاملUniversal Voting Protocol Tweaks to Make Manipulation Hard
There are two basic approaches to generalize the propagation mechanism of the two-player Minimax search algorithm to multi-player (3 or more) games: the MaxN algorithm and the Paranoid algorithm. The main shortcoming of these approaches is that their strategy is fixed. In this paper we suggest a new approach (called MPMix) that dynamically changes the propagation strategy based on the players’ ...
متن کاملRobust Opponent Modeling in Real-Time Strategy Games using Bayesian Networks
Opponent modeling is a key challenge in Real-Time Strategy (RTS) games as the environment is adversarial in these games, and the player cannot predict the future actions of her opponent. Additionally, the environment is partially observable due to the fog of war. In this paper, we propose an opponent model which is robust to the observation noise existing due to the fog of war. In order to cope...
متن کاملSearch Policies in Multi - Player Games 1
In this article we investigate how three multi-player search policies, namely maxn, paranoid, and Best-Reply Search, can be embedded in the MCTS framework. The performance of these search policies is tested in four different deterministic multi-player games with perfect information by running self-play experiments. We show that MCTS with the maxn search policy overall performs best. Furthermore...
متن کامل